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Two  new  parallel  algorithms  are  presented  for  the  problem  of  labeling  the  connected 
components  of  a  binary  image,  which  is  also  known  as  the  Connected  ones  problem*.  The 
machine  model  is  an  SIMD  two-dimensional  mesh  connected  computer  consisting  of  an  N 
x  N  array  of  processing  elements,  each  containing  a  single  pixel  of  an  N  x  N  image.  Both 
new  algorithms  use  a  ^shrinking*  operation  defined  by  Levialdi  and  have  time  complexities 
of  0(N  log  N)  bit  operations,  which  makes  them  the  fastest  local  algorithms  for  the  prob¬ 
lem.  Compared  with  other  approaches  having  similar  or  better  time  complexities,  this  local 
approach  dramatically  simplifies  the  algorithms  and  reduces  the  constants  of  proportional¬ 
ity  by  nearly  two  orders  of  magnitude,  thus  making  them  the  first  practical  algorithms  for 
the  problem.  The  two  algorithms  differ  in  the  amount  of  memory  required  per  processing 
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element;  the  first  uses  O(N)  bits  while  the  second  employs  a  novel  compression  scheme  to 
reduce  the  requirement  to  0(log  N)  bits. 


Practical  Algorithms  for  Image  Component  Labeling 
on  SIMD  Mesh  Connected  Computers 
(Preliminary  Version) 


R.E.  Cypher*,  J.L.C.  Sanz,**  and  L.  Snyder* 


Abstract 

Two  new  parallel  algorithms  are  presented  for  the  prob¬ 
lem  of  labeling  the  connected  components  of  a  binary  im¬ 
age.  which  is  also  known  as  the  “connected  ones  prob¬ 
lem."  The  machine  model  is  an  SIMD  two-dimensional 
mesh  connected  computer  consisting  of  an  N  x  N  array  of 
processing  elements,  each  containing  a  single  pixel  of  an 
N  x  N  image.  Both  new  algorithms  use  a  “shrinking"  op¬ 
eration  defined  by  Levialdi  and  have  time  complexities  of 
0(N  log  N)  bit  operations,  which  makes  them  the  fastest 
local  algorithms  for  the  problem.  Compared  with  other  ap¬ 
proaches  having  similar  or  better  time  complexities,  this 
local  approach  dramatically  simplifies  the  algorithms  and 
reduces  the  constants  of  proportionality  by  nearly  two  or¬ 
ders  of  magnitude,  thus  making  them  the  first  practical 
algorithms  for  the  problem.  The  two  algorithms  differ  in 
the  amount  of  memory  required  per  processing  element; 
the  first  uses  O(N)  bits  while  the  second  employs  a  novel 
compression  scheme  to  reduce  the  requirement  to  O(log  N) 
bits. 

1  Introduction 

The  tasks  encountered  in  machine  vision  can  be  roughly 
divided  into  three  classes  based  on  the  data  structures  they 
use.  Low  level  tasks  operate  on  large  2-dimensional  arrays 
of  pixels.  High  level  tasks  operate  on  smaller  symbolic  data 
structures  such  as  graphs  that  are  intended  to  describe  the 
scene  under  analysis  in  a  manner  closer  to  human  under¬ 
standing.  Intermediate  level  tasks  link  the  low  and  high 
levels  by  taking  an  array  of  pixels  as  input  and  creating  a 
symbolic  data  structure  as  output.  Creating  parallel  archi¬ 
tectures  for  intermediate  level  vision  tasks  is  particularly 
difficult  because  both  symbolic  and  iconic  (pixel  array) 
data  structures  must  be  accommodated.  The  design  of 
parallel  architectures  for  the  various  image  processing  task 
levels  is  currently  a  topic  of  great  interest  to  the  machine 
vision  community  (1-4). 

This  paper  addresses  an  important  intermediate  level 
task  in  machine  vision:  labeling  the  connected  components 
of  a  binary  image.  New  algorithms  are  presented  which  run 
on  an  SIMD  mesh  connected  computer  consisting  of  an  N 
x  N  arTay  of  processing  elements,  each  of  which  holds  a 
single  pixel  of  an  N  x  N  image.  The  problem  of  image  com¬ 
ponent  labeling,  also  known  as  the  “connected  ones  prob¬ 
lem"  ,  consists  of  associating  labels  with  the  1  valued  pixels 
of  a  binary  image  such  that  any  two  pixels  have  the  same 
label  if  and  only  if  they  lie  in  the  same  connected  compo¬ 
nent,  where  a  connected  component  is  a  maximal  region 


of  1  valued  pixels  such  that  any  two  pixels  in  the  region 
lie  on  a  path  that  is  connected  and  only  passes  through 
pixels  with  value  1.  The  two  common  definitions  of  con¬ 
nectedness  are  4-connectedness  and  8-connectedness.  Two 
pixels  are  4-connected  if  they  are  adjacent  vertically  or 
horizontally,  and  they  are  8-connected  if  they  are  adjacent 
vertically,  horizontally  or  diagonally  [5).  The  labeling  of 
connected  components  has  been  intensively  studied  [6-10] 
and  is  important  in  many  applications.  It  allows  regions 
(the  connected  components)  to  be  identified  so  that  the 
analysis  of  the  image  can  be  performed  on  a  higher  level 
than  the  pixel  level. 

A  two-dimensional  mesh  connected  computer  consists 
of  a  large  number  of  processing  elements  (PEs)  arranged 
in  a  square  array,  as  shown  in  Figure  1.  Each  PE  consists 
of  a  processor  and  an  associated  memory.  For  the  number 
of  PEs  in  the  array  to  approach  the  number  of  pixels  in  a 
typical  image  (for  example,  21*),  the  PEs  must  be  simple 
and  inexpensive.  In  particular,  the  PEs  considered  here 
are  bit  serial  machines  that  operate  in  a  Single  Instruction 
Stream,  Multiple  Data  Stream  (SIMD)  mode,  with  all  con¬ 
trol  signals  coming  from  a  single  control  unit.  The  control 
unit  reads  instructions  from  its  private  memory,  decodes 
them,  and  broadcasts  the  control  signals  to  the  PE  array. 
In  addition  to  broadcasting  the  control  information  to  the 
processors,  the  control  unit  sends  addresses  to  the  memory 
units,  so  every  PE  accesses  the  same  memory  location  at 
a  given  time. 

Each  PE  has  a  special  register  called  a  mask  register. 
When  an  instruction  is  sent  from  the  controller  to  the  array 
of  PEs,  only  those  PEs  with  a  1  in  their  mask  register 
perform  the  instruction;  all  others  do  nothing.  This  allows 
operations  to  be  performed  on  a  subset  of  the  PEs  in  a  data 
dependent  manner.  Of  course,  there  are  some  instructions 
which  operate  on  all  PEs  regardless  of  the  setting  of  the 
mask  registers,  thus  allowing  the  disabled  PEs  to  be  used 
again. 

The  two-dimensional  mesh  interconnection  structure  is 
easy  to  construct  because  it  is  regular,  it  has  short  con¬ 
nections,  it  requires  only  4  connections  per  PE,  and  it  is 
possible  to  build  in  two  dimensions  without  having  any 
connections  cross.  The  bit  serial  processors  have  a  one 
bit  wide  data  path  to  their  four  nearest  neighbors.  Com¬ 
mercial  versions  of  such  machines  include  CLIP4  [11],  the 
MPP  [12]  and  the  GAPP  [13]. 
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This  paper  presents  two  new  algorithms  for  labeling 
the  connected  components  of  an  image  on  mesh  connected 
computers,  together  with  comparisons  with  previously  pub¬ 
lished  algorithms.  The  authors  believe  these  are  the  first 
practical  algorithms  for  labeling  connected  components  on 
large  mesh  connected  computers,  because  they  have  very 
modest  architectural  requirements  and  are  nearly  two  or¬ 
ders  of  magnitude  faster  than  previously  published  algo¬ 
rithms. 


A  variety  of  algorithms  are  known  for  connected  com¬ 
ponent  labeling,  so  it  is  convenient  to  divide  them  into  two 
classes:  local  algorithms  and  random  access  algorithms. 
Local  algorithms  repeatedly  change  the  contents  of  each 
processor  based  on  the  contents  of  neighboring  processors. 
As  will  be  explained  in  the  next  section,  all  previously  pub¬ 
lished  local  algorithms  require  0(N2  log  N)  bit  operations, 
while  the  local  algorithms  presented  here  need  only  0(N 
log  N)  bit  operations.  Random  access  algorithms  achieve 
good  asymptotic  performance  by  using  complex  pointer 
manipulation  routines.  For  example,  Nassimi  and  Sahni 
[9]  give  an  algorithm  requiring  0(N  log  N)  bit  operations 
and  Oflog  N)  bits  of  memory,  matching  the  bounds  of 
the  second  algorithm  presented  here.  Interestingly,  though 
the  bit  operations  measure  of  complexity  seems  preferable, 
word  operations  are  also  used  as  a  measure;  adopting  this 
measure,  both  Nassimi  and  Sahni’s  algorithm  and  the  first 
algorithm  presented  here  require  O(N)  time,  though  the 
algorithm  given  here  also  requires  O(N)  bits  of  memory 
per  processor. 

The  importance  of  the  new  local  algorithms  presented 
here  is  not  that  they  match  or  nearly  match  the  best  known 
asymptotic  complexity,  but  that  their  constants  of  propor¬ 
tionality  are  very  small.  The  complexity  of  random  access 
algorithms  causes  their  constants  to  be  large.  For  example, 
the  Nassimi  and  Sahni  algorithm  can  be  shown  to  require 
1276  N  log  N  -1-  0(N1/J  log  N)  communication  operations, 
while  the  two  algorithms  reported  here  require  12N  log  N 
+  4N  and  14N  log  N  communications  operations,  respec¬ 
tively,  in  the  4-connected  case.  (Similar  results  apply  for 
the  8-connected  case.)  It  seems  likely  that  Nassimi  and 
Sahni's  algorithm  can  be  modified,  using  techniques  de¬ 
veloped  by  Stout  [17],  to  yield  an  O(N)  time  algorithm. 

However,  it  is  likely  that  such  an  algorithm  would  also 
have  a  large  constant  of  proportionality,  making  it  inferior 
to  those  presented  here  for  practical  values  of  N. 

2.1  Local  Neighborhood  Algorithms 

A  well  known  local  neighborhood  algorithm  for  labeling 
the  connected  components  of  an  image,  which  will  be  called 
the  “component  broadcasting  algorithm" ,  has  a  worst  case 
time  complexity  of  0(N*  log  N)  bit  operations.  Each  PE 
containing  a  pixel  with  value  1  is  initially  assigned  a  la¬ 
bel  that  is  the  concatenation  of  its  x  and  y  coordinates. 


Then  a  series  of  broadcasting  operations  is  performed.  A 
broadcasting  operation  consists  of  transferring  the  label 
of  each  PE  with  a  1-valued  pixel  to  each  of  its  (4-  or  8- 
connected)  neighbor  PEs  having  a  1-valued  pixel.  Then 
every  PE  calculates  the  minimum  of  its  current  label  and 
the  labels  which  it  has  received,  taking  this  minimum  as 
its  new  label.  The  connected  components  are  correctly  la¬ 
beled  when  a  broadcasting  operation  fails  to  change  any 
of  the  labels.  Each  broadcasting  operation  requires  0(log 
N)  bit  operations,  and  it  will  be  shown  that  0(NJ1  broad¬ 
casting  operations  may  be  required. 

Another  0(N2  log  N)  time  local  neighborhood  algo¬ 
rithm  is  given  by  C.  Dyer  and  A.  Rosenfeld  [14].  It  consists 
of  first  identifying  a  special  pixel  in  each  component  and 
assigning  a  unique  label  to  each  of  the  special  pixels.  The 
special  pixels  are  identified  by  using  an  algorithm  devel¬ 
oped  by  S.  Kosaraju  [15].  The  next  step  consists  of  build¬ 
ing  a  minimum  spanning  tree  for  each  component  that  is 
rooted  at  a  special  pixel.  The  labels  are  then  broadcast 
from  the  special  pixels  to  the  other  PEs  in  the  component 
using  the  spanning  trees.  This  operation  is  very  similar  to 
the  component  broadcasting  algorithm  given  above. 

In  order  to  analyze  the  time  complexity  of  the  above 
algorithms,  it  is  useful  to  introduce  two  new  terms.  The 
“intrinsic  distance"  between  two  pixels  in  the  same  con¬ 
nected  component  is  one  less  than  the  number  of  pixels  in 
the  shortest  (4-  or  8-connected)  path  between  them  com¬ 
posed  only  of  pixels  with  value  1.  The  “intrinsic  diame¬ 
ter"  of  a  connected  component  is  the  largest  intrinsic  dis¬ 
tance  between  any  pair  of  pixels  in  the  component.  The 
number  of  broadcasting  operations  required  by  the  above 
algorithms  is  proportional  to  the  largest  of  the  intrinsic 
diameters  of  the  connected  components  in  the  image.  In 
images  with  small,  convex  connected  components,  the  in¬ 
trinsic  diameters  are  small  and  the  algorithms  provide  the 
desired  labeling  quickly.  But  some  images  have  very  long 
and  thin  connected  components  with  intrinsic  diameters 
proportional  to  the  NJ  area  of  the  image.  One  such  ex¬ 
ample  is  shown  in  Figure  2.  When  it  is  safe  to  assume 
that  no  connected  components  will  have  an  intrinsic  di¬ 
ameter  greater  than  N,  the  above  algorithms  may  be  the 
best  possible. 

The  first  new  algorithm  given  here  performs  connected 
components  labeling  in  0(N  log  N)  bit  operations  in  the 
worst  case  using  O(N)  bits  of  memory.  This  algorithm,  to 
be  called  the  “component  shrinking  algorithm",  is  based 
on  repeated  application  of  a  binary  morphological  opera¬ 
tion  defined  by  S.  Levialdi  [16].  The  value  of  pixel  P(ij) 
is  determined  from  the  previous  values  of  pixels  P(i.j), 
P(i+l,j),  P(ij-l)  and  P(i+1,  j-1).  For  6-connectedness, 
the  new  value  of  pixel  P(ij)  is  defined  to  be 

b(h(P(ij)  +  P(ij-l)  +  P(i+1  j)  -  1)  +  h(P(ij)  + 
P(H-lj-l)  -  1)) 

where  h(t)  is  the  “Heaviside"  function  defined  as  follows: 
h(t)  =  0  for  t  <  0,  h(t)  =  1  for  t  >  0.  The  2x2  neigh- 


borhoods  that  create  a  0  are  shown  in  the  upper  part  of 
Figure  3. 

The  effect  of  this  operation,  called  the  “shrinking  oper¬ 
ation",  can  be  easily  understood  as  follows.  Assume  that 
pixel  P(i,j)  is  in  row  i  and  column  j  and  that  pixel  P(0,0) 
is  in  the  lower  lefthand  corner  of  the  image.  Then  if  pixel 
P(iJ)  originally  has  value  1,  it  will  have  value  1  after  the 
shrinking  operation  if  and  only  if  at  least  one  of  its  three 
neighbors  to  the  left,  above,  or  diagonally  left  and  above 
has  a  1.  If  pixel  P ( i , j )  originally  has  value  0,  it  will  have 
value  1  after  the  shrinking  operation  if  and  only  if  both 
its  neighbor  to  the  left  and  its  neighbor  above  have  Is. 
Levialdi  proved  that  when  this  shrinking  operation  is  ap¬ 
plied  in  parallel  to  all  pixels  in  an  image,  only  Is  which  do 
not  disconnect  a  component  will  be  erased  and  that  Os  do 
not  become  Is  when  this  would  connect  previously  uncon¬ 
nected  components.  The  shrinking  operation  has  the  ef¬ 
fect  of  squeezing  each  connected  component  into  the  lower 
righthand  corner  of  its  bounding  box  until  only  1  pixel 
remains,  which  is  then  deleted  by  the  next  shrinking  oper¬ 
ation.  An  example  is  shown  in  the  lower  part  of  Figure  3. 
The  number  of  shrinking  operations  required  to  shrink  an 
object  until  it  contains  only  1  pixel  is  at  most  the  distance 
from  the  lower  righthand  corner  of  the  object’s  bounding 
box  to  the  most  distant  pixel  in  the  object,  where  the  dis¬ 
tance  between  the  points  is  measured  using  the  Manhattan 
metric:  the  distance  from  (xl,yl)  to  (x2,y2)  is  |xl-x2|  + 
|yl-y2|.  As  a  result,  every  connected  component  will  have 
disappeared  after  2N  shrinking  operations. 

Levialdi  uses  the  shrinking  operation  to  count  the  num¬ 
ber  of  connected  components.  In  his  algorithnm,  whenever 
a  connected  component  disappears,  a  special  marker  is  cre¬ 
ated  which  then  moves  to  the  lower  righthand  corner  of  the 
array.  Whenever  two  special  markers  arrive  at  the  same 
location,  a  new  marker  which  represents  the  sum  of  the 
previous  markers  is  created.  The  marker  which  arrives  at 
the  lower  righthand  corner  after  2N  iterations  represents 
the  number  of  connected  components  in  the  image. 

The  component  shrinking  algorithm  is  based  on  Levialdi's 
shrinking  operation  and  operates  in  two  phases.  In  the  first 
phase,  Levialdi's  shrinking  operation  is  applied  in  parallel 
to  the  entire  image  2N  times.  After  each  shrinking  opera¬ 
tion,  a  different  image  is  obtained.  The  result  of  applying 
the  shrinking  operation  y  times  to  the  original  image  will 
be  called  “partial  result  y”.  Assume  that  partial  result  y 
is  stored  in  memory  location  y  in  the  PEs. 

In  the  second  phase,  the  labels  are  assigned  by  exam¬ 
ining  the  partial  results  in  reverse  order,  starting  with  the 
empty  image  that  resulted  from  the  final  shrinking  oper¬ 
ation.  Stage  y  of  the  second  phase,  y  ranging  from  2N  to 
1.  consists  of  first  transferring  the  label  from  each  PE(ij) 
having  a  1  in  memory  location  y  to  those  PEs  (i,j),  (i-l,j), 
(ij+1)  and  (i-lj+1)  having  a  1  in  memory  location  y-1. 
Call  this  the  “first  assignment."  Next,  any  PE  (ij)  which 
has  a  1  in  memory  location  y-1  and  which  has  not  received 
a  label  generates  a  new  label  which  is  the  concatenation  of 
the  numbers  y,  i  and  j.  Call  this  the  “second  assignment.” 


After  processing  all  values  of  y  from  2N  to  1.  each  con¬ 
nected  component  will  have  a  unique  label.  This  can  be 
seen  by  noting  that  a  new  label  is  created  for  exactly  those 
pixels  which  became  isolated  Is  during  the  shrinking  pro¬ 
cess.  Because  every  component  is  shrunk  an  isolated 
1  which  exists  for  only  one  stage,  there  is  a  unique  la¬ 
bel  created  by  the  second  assignment,  for  every  connected 
component.  The  label  for  a  component  is  transferred  from 
stage  y  to  stage  y-1  in  a  way  that  insures  that  it  is  sent 
to  all  pixels  at  stage  y-1  which  correspond  to  the  same 
component  at  stage  y,  and  to  no  others,  as  will  now  be 
shown. 

It  should  be  evident  by  inspection  of  the  shrinking  rules 
that  every  PE  with  a  (nonisolated)  1  at  stage  y-1  receives 
one  or  more  labels  in  the  first  assignment.  If  these  are  all 
the  same  then  it  is  the  label  assigned  to  the  same  com¬ 
ponent  at  stage  y  and,  by  induction,  the  label  is  correct. 
What  remains  is  showing  that  the  labels  received  in  the 
first  assignment  are  all  the  same.  If  a  PE  were  to  get  dif¬ 
ferent  labels,  then  it  would  be  part  of  a  single  component 
in  partial  result  y-1  that  shrank  to  different  isolated  Is. 
Because  this  is  impossible  given  Levialdi’s  rules,  the  labels 
received  by  a  processor  during  the  first  assignment  must 
be  consistent. 

The  worst  case  time  requirement  for  the  “component 
shrinking  algorithm”  is  less  than  that  required  for  the 
“component  broadcasting  algorithm”  because  the  shrink¬ 
ing  algorithm  allows  labels  to  pass  through  PEs  that  hold 
image  pixels  not  belonging  to  the  component.  In  contrast, 
the  broadcasting  algorithm  sends  labels  only  to  PEs  which 
hold  pixels  belonging  to  the  component,  so  the  labels  must 
follow  the  contours  of  the  components  to  which  they  be¬ 
long.  As  the  spiral  in  Figure  2  demonstrates,  this  is  very 
slow  in  the  worst  case. 

2.2  Log  Space  Connected  Component  La¬ 
beling  on  a  Mesh 

An  important  limitation  of  the  component  shrinking  al¬ 
gorithm  is  that  it  requires  O(N)  bits  of  memory  per  PE. 
while  the  component  broadcasting  algorithm  requires  only 
O(logN)  bits  of  memory  per  PE.  However,  the  component 
shrinking  algorithm  can  be  modified  so  that  it  too  requires 
only  0(log  n)  bits  of  memory  per  PE.  The  resulting  algo¬ 
rithm,  which  will  be  called  the  “log  component  shrinking 
algorithm",  also  requires  0(N  log  N)  time  in  the  worst 
case. 

The  log  component  shrinking  algorithm  is  the  same 
as  the  component  shrinking  algorithm  except  that  only 
log(N)+2  partial  results  from  the  shrinking  operations  are 
stored.  The  major  difference  between  the  algorithms  is 
that  in  the  original  algorithm  every  partial  result  y  was 
stored,  but  in  this  algorithm  many  of  the  partial  results 
are  not  stored  and  so  they  must  be  calculated.  Since  the 
second  phase  partial  results  are  processed  in  order  from  the 
last  to  the  first,  it  would  be  convenient  if  partial  result  y-1 
could  be  calculated  from  partial  result  y.  Unfortunately, 
this  is  not  the  case.  Instead,  the  log(N)  +  2  stored  partial 
results  must  be  used  judiciously.  The  technique  used  is  to 
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store  a  few  of  the  partial  results  -  those  positioned  at  ap¬ 
proximately  1/2,  3/4,  7/8.  15/16,  etc.  of  the  way  through 
the  sequence  -  and  then  to  recreate  the  missing  ones.  The 
exact  rules  specifying  how  the  results  are  stored  are  given 
below. 

Adopt  the  following  notation: 


•  y *  is  the  k-th  bit  of  the  binary  representation  of  the 
number  y,  where  the  least  significant  bit  is  the  0-th 
bit.  For  example,  yi  —  0  if  y  =  5. 

•  The  binary  representation  of  a  nonnegative  integer 
is  written  by  listing  the  bits  within  parentheses  sepa¬ 
rated  by  commas.  For  example,  y  =  (ym,  ym_,.  ...y0) 
is  an  m  +  1  bit  representation  of  v. 

•  Last(y)  is  the  bit  position  of  the  rightmost  1  in  the 
binary  representation  of  y,  with  bit  position  0  being 
the  least  significant  bit.  Last(0)  is  undefined.  For 
example,  Last(12)  =  2. 

•  Flip(y,  j)  is  the  number  with  the  binary  representa¬ 
tion  that  is  the  binary  representation  of  y  with  bit  j 
complemented.  For  example,  Flip(7,l)  =  5. 

•  ResultJn(j)  =  y  when  partial  result  y  is  stored  in 
memory  location  j.  Notice  that  the  value  of  Re¬ 
sult  Jn(j)  will  depend  on  when  during  the  algorithm 
it  is  evaluated. 

Figure  4  shows  some  values  for  the  functions  Last(y)  and 
Flip(y,  Last(y)). 

Using  these  definitions,  it  is  possible  to  define  how 
the  algorithm  uses  the  0(log(N))  bits  of  memory  per  PE. 
When  partial  result  y  is  calculated  during  the  first  phase,  it 
is  stored  in  memory  location  j  where  j  =  Last(y).  Then  in 
the  second  phase,  after  assigning  the  labels  for  partial  re¬ 
sult  y,  partial  result  x  is  required,  where  x  =  y  -  1.  Partial 
result  x  is  calculated  by  retrieving  partial  result  v  from 
memory,  where  v  <  x,  and  applying  x  -  v  shrinking  op¬ 
erations  to  it.  The  results  of  these  shrinking  operations 
are  stored  in  the  appropriate  memory  locations  (partial 
result  w  is  stored  in  location  Last  (w)).  In  the  log  compo¬ 
nent  shrinking  algorithm,  v  =  Flip(y,  Last(y)).  It  will  be 
shown  that  by  retrieving  partial  result  v  where  v  is  defined 
in  this  manner,  only  0(N  log  N)  shrinking  operations  are 
required  during  the  second  phase.  As  a  result,  the  algo¬ 
rithm  operates  in  0(N  log  N)  time.  Figure  5  shows  the 
contents  of  the  variables  v,  w,  x  and  y  after  each  call  to 
the  Shrink  function,  assuming  that  N  =  4.  In  addition, 
the  partial  result  which  is  located  in  each  of  the  4  image 
memory  locations  is  shown.  Note  that  there  are  no  entries 
for  odd  values  of  y  because  when  y  is  odd,  partial  result 
y-1  is  available  without  performing  any  additional  shrink¬ 
ing  operations.  The  fact  that  partial  result  v  is  present 
in  memory  when  partial  result  x  is  needed  is  treated  be¬ 
low.  A  pseudo-code  representation  of  the  algorithm  and 
an  analysis  of  its  complexity  follow. 


The  algorithm  is  specified  using  a  modified  C  language 
syntax.  The  keyword  “PE”  in  a  variable  declaration  in¬ 
dicates  that  every  PE  has  a  copy  of  the  variable,  while 
variables  declared  without  the  “PE"  keyword  are  stored 
in  the  controller.  The  “ where... elsewhere”  statement  is  a 
generalization  of  the  “if.. .else”  statement  that  allows  some 
of  the  PEs  to  perform  one  set  of  operations  and  the  re¬ 
maining  PEs  to  perform  a  different  set  of  operations.  The 
“Shift(vanable,  direction)”  statement  transfers  the  given 
variable  1  PE  in  the  given  direction.  PEs  on  the  edge  of 
the  mesh  that  would  not  receive  data  during  a  Shift  are 
given  a  0.  It  is  assumed  that  N  is  a  power  of  2.  There  is 
no  code  for  the  first  phase  of  the  algorithm,  because  when 
v  =  2N,  v  =  0  (because  N  is  a  power  of  2),  so  partial  re¬ 
sults  1  through  2N-1  are  calculated  when  y  =  2N.  This  is 
exactly  the  first  phase.  The  code  for  the  main  routine  of 
the  log  component  shrinking  algorithm  is  given  in  Figure 
6.  The  code  for  a  number  of  supporting  routines  is  given 
in  Figure  7. 

2.3  Algorithm  Correctness 

In  the  log  component  shrinking  algorithm,  after  assigning 
the  labels  for  partial  result  y,  it  is  necessary  to  calculate 
partial  result  x,  where  x  *  y-1.  This  is  done  by  retriev¬ 
ing  partial  result  v,  where  v  =  Flip(y,Last(y)).  If  v  =  0, 
this  result  is  retrieved  from  memory  location  log(N)  4-  1, 
because  of  the  initial  assignment  of  the  original  image  to 
that  location.  If  v  ^  0,  this  result  is  retrieved  from  mem¬ 
ory  location  Last(v).  The  fact  that  Last(v)  actually  does 
hold  partial  result  v,  provided  that  v  /  0,  rests  on  the 
following  claim. 

CLAIM:  When  the  labels  for  partial  result  s  are  as¬ 
signed  in  the  log  component  shrinking  algorithm,  for  all 
Jb,  0  <  k  <  l,  ifs*  =  1  then  z  =  Result  Jn(k)  =  (s/,si_ i,  ...s*. 
Zfc_i, ...,  zo)  where  z,  =•  0  for  0  <  i  <  k  —  1. 

PROOF :  Omitted. 

2.4  Time  Analysis 

The  asymptotic  complexity  of  the  algorithm  is  governed 
by  the  number  of  times  the  Shrink  function  in  the  inner 
for-loop  is  executed.  Let  this  number  be  T(N),  let  n  =  2N 
and  let  m  =■  log(n).  Then: 


T(N)  =  Er.i(2t“,<’)  -  1) 

=  (£”_,  -  n 

=  (E"o’  2— ln2’)  +  n  -  n 

=  rai" 

=  jnm 

=  N(logN  +  1)  =  O(NlogN) 

Line  1  follows  from  the  fact  that  the  variable  w  varies  from 
v+1  through  y-1,  and  the  fact  that  v  =  Flip{y,  Last{y))  = 
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y  _  so  for  any  value  of  y,  the  inner  for- loop  is 

executed  2La“^)  -  1  times.  Line  3  follows  from  line  2  by 
noting  that  there  are  2"'"'n  numbers  i  where  1  <  i  <  n 
such  that  Last(x)  =  t  for  each  t,  0  <  i  <  m  -1  (recall  that 
n  —  2m).  The  additional  “+n”  term  corresponds  to  i  =  n, 
so  Last(x)  =  m. 


Two  new  local  algorithms  have  been  presented  for  the 
component  labeling  problem,  the  first  requiring  0(N  log 
N)  bit  operations  in  worst  case  and  O(N)  bits  of  memory 
per  PE  and  the  second  having  the  same  time  complex¬ 
ity,  but  requiring  O(log  N)  bits  of  memory  per  PE.  These 
bounds  improve  on  known  local  algorithms  by  a  factor  of 
N.  Although  it  seems  likely  that  the  techniques  presented 
in  [1 7j  can  be  used  to  obtain  an  O(.V)  time  algorithm,  such 
an  algorithm  is  expected  to  be  slower  than  those  given  here 
for  practical  values  of  N.  The  two  new  algorithms  are  thus 
the  first  practical  algorithms  for  connected  component  la¬ 
belling. 
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Figure  1.  Mesh  connected  computer. 
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Figure  2.  Spiral  connected  component. 
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Figure  3.  Upper  Part:  configurations  for  Levialdi’s 
shrinking  algorithm.  Lower  Part:  example  of  component 
shrinking. 


Figure  4.  Table  for  LAST  and  FLIP  operators. 
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Figure  5.  Memory  contents  after  each  call  to  the  shrink 
function. 
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Algorithm 


Sdef  me  NOLABEL  0 

/*  function  Log.Space.Label  returns  the  connected  component  labels  */ 
/•  for  the  given  original  image:  •/ 

PE  int  Log. Space. Label (original. image) 

PE  int  original. image ;  /*  image  to  be  labeled  •/ 

{ 

int  v,  v,  x,  y;  /*  controller  variables  holding  */ 

/•  result  numbers  */ 

PE  int  image[log(N)*2] ;  /*  0(log(N))  memory  for  holding  */ 

/*  partial  results  •/ 

PE  int  old.label,  new.label  /‘previous  and  current  labels  •/ 

image [log(N)+l]  •  original. image; 

old.label  -  NOLABEL; 

for  (y  •  2*N;  y  >■  1;  — y) 

{ 

x  -  y-1; 

v  •  Flip(y ,Last(y) ) ; 
for  (w  ■  v*l;  «  <•  x;  ♦♦*) 

{ 

if  (w-1  ••  0) 

iaage[Last(v)]  -  Shrink(image[log(N)+l] ) ; 

> 

else 

{ 

image [Last (v)]  -  Shrink(image[Last(v-l)] ) ; 

> 

> 

if  (x  0) 

new.label  ■  Label(old. label. image[log(N)*l] ,y) ; 

else 

new.label  *  Label (old.label, image [Last (x) ] ,y) ; 
old.label  •  new.label; 

> 

retum(new.label) ; 


Figure  6.  Log  component  shrinking  algorithm. 
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Algorithm  Subroutines 


•define  NOLABEL  0 


PE  int  Heaviside(x) 
PE  int  x; 

{ 

PE  int  answer; 


where  (x  >  0) 

answer  ■  1; 
elsewhere 

answer  *  0; 
return (answer) ; 

> 


PE  int  ShrinJc(  inage) 

PE  int  inage; 

< 

PE  int  here,  north,  west,  northwest; 


here  »  iaage; 

Shift (inage, DOWN) ; 
north  *  inage; 
inage  *  here; 

Shif t( inage, RIGHT) ; 
west  *  inage; 

Shif t (inage, DOWN) ; 
northwest  *  mage; 

return(Heaviside(Heaviside(here‘west 
♦north- 1 ) ♦Heavis  xde (heret 
northwest-i))) ; 

> 


/•function  Label  returns <the  labels 
for  partial  result  y-i:  •/ 

PE  int  Label (old.label, new. image, y) 

PE  int  old.label;  /‘labels  for  partial 
result  y  »/ 

PE  int  new. image;  /‘partial  result  y-1  •/ 

PE  int  y; 

( 

PE  int  answer,  here,  south,  east,  southeast; 


here  *  old.label; 

Shift (old.label, UP) ; 
south  ■  old.laoel; 
old.label  ■  here; 

Shift (old.label. LEFT) ; 
east  ■  old.label; 

Shift (old.label, UP) ; 
southeast  ■  old.label; 
where  (new.image  ■■  1) 

{ 

where  (here  <-  NOLABEL) 

answer  »  here; 

elsewhere  where  (south  1 «  NOLABEL) 
answer  *  south; 

elsewhere  where  (east  !•  NOLABEL) 

answer  ■  east; 

elsewhere  where  (southeast  I*  NOLABEL) 

answer  *  southeast; 
elsewhere 

answer  •  y*N**2+pe.row*N‘pe.col; 

> 

elsewhere 

{ 

answer  ■  NOLABEL; 

> 

return(answer) ; 


Figure  7.  Algorithm  Subroutines. 
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